-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(logfwd): log forwarding implementation II #256
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Jordan! Even if there are some loose ends, this is clearly a good step in the right direction. Here is some early feedback to help shaping it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work so far, I think we need to be a bit more careful around defining a graceful termination path for pullers (go until the end of the ring buffer), gatherers (wait up to a timeout for the pullers to stop, then wait for up to a timeout for the client to flush).
The graceful shutdown path would be used when a service restarts or when pebble shuts down. All cases around reconfiguration can be much more forceful in ending all log forwarding for unwanted targets/services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left a few comments, all relatively minor.
f7bd5e6
to
ff53051
Compare
Errors from the gatherer tomb are not being surfaced to the user, as there is nothing checking tomb.Err(). Additionally, a write/flush failure should not bring down the gatherer - it could be e.g. a temporary outage. Better to log the failures rather than exiting the main loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this, Jordan. We're clearly approaching the end of the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really impressive PR. I have learnt a lot from studying your code. I have really tried to give you an in depth review - I hope some things will be useful for you. If not, please discard.
…ch are intended for use by other types
- close ringbuffer on service stop - add some more debug logging
- catch and log error from tomb.Wait()
- add comment about when puller.loop exits
- extra comment for gatherer final flush ctx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Jordan. This is looking great.
There's still one important issue to address, unfortunately:
This avoids runtime panic from calling Go after all goroutines have returned. Also, we need to kill the pullerGroup tomb in Stop() to tell the pullerGroup that it's time to shut down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Left a few very nitty comments (some of which are "take it or leave it" naming suggestions), but I'm happy with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy where this ended up. Thank-you for all the hard work on it.
// This goroutine lives for the lifetime of the pullerGroup. This is so that, | ||
// if needed, we can safely remove all pullers and then add more, without | ||
// causing a panic (tombs can't be reused once all the tracked goroutines | ||
// have finished). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gustavo said this is a nice comment. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the changes. Looks good! +1
@barrettj12 We discussed this at our APAC spec review meeting live with Gustavo and agreed to merge. Merge party! |
Add a loki.Client type that encodes sends Pebble logs to Loki. This implements the logstate.logClient interface introduced in #256.
This PR contains the second part of #165, including the actual mechanics of log forwarding in Pebble. It builds upon #209 and #252. This includes an abstract
logClient
interface but doesn't include the specific Loki/syslog implementations - this will come in later PRs.This is a modification of #218, designed to fix fundamental issues identified with that PR.
Current design
For each log target, there is a single "gatherer", which receives logs from a bunch of services and writes them to a "client". The gatherer runs a control loop in a separate goroutine, which waits to receive logs on a channel, and writes these to the client.
For each service, the gatherer spawns a "log puller". Each puller runs in a separate goroutine, pulling logs from an iterator on the service's ringbuffer. Pulled logs are sent to the gatherer's control loop on the shared channel.
The
logClient
interface represents a client to a specific type of backend. In future PRs, we will addlokiClient
andsyslogClient
types which implementlogClient
.logClient
includes two methodsClient implementations have some freedom about the semantics of these methods.
Write
could add the log to the client's internal buffer, whileFlush
would prepare and send an HTTP request with the buffered logs.Write
could serialise the log directly to the open connection, whileFlush
would be a no-op.Teardown
Gracefully stopping a log gatherer is a complex process with multiple steps.
All this logic is encapsulated in the
gatherer.stop()
method.QA
I've included some sample implementations of
logClient
here. They just print the logs to stdout. These can be used to verify that the log forwarding mechanics work properly.Create a simple logging service, e.g.
and a simple plan using this service
Add the
fake.go
file to the repo.Comment out the following line
pebble/internals/overlord/logstate/gatherer.go
Line 356 in 3e904f9
and replace it with e.g.
You might also want to change the gatherer's tick period:
pebble/internals/overlord/logstate/gatherer.go
Line 32 in 3e904f9
Run Pebble with
and verify the logs are printing to stdout.
JUJU-3776